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instruction sequence. When both instructions are within the subset and there are no dependencies, the first and second 
instructions can be issued in parallel in the first and second pipelines. 
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FIG. I (PRIOR ART) 
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MICROPROCESSOR WITH APPARATUS FOR 
PARALLEL EXECUTION OF INSTRUCTIONS 

5 FIELD OF THE INVENTION 

The present Invention relates generally to the field of computers; In 
particular, the invention relates to microprocessor architecture ancHo ways 
of Improving the speed at which Instructions are executed. 

10 BACKGROUND OF THE INVENTION 

Computers have historically been designed to execute instructions 
sequentially; that Is, one after another. While sequential execution of 
computer Instructions does provide a logical and orderly method of 
operation, the ever-present demand to Increase processing speed has led 
1 5 researchers to explore ways of Implementing a parallel execution scheme. 

There are numerous problems which must ba overcome if one Is to 
successfully design a computer or microprocessor which is capable of 
executing multiple instructions In parallel. For example, microprocessors 
typically have an Instruction set architecture which includes hundreds of 
20 Individual Instructions. Counting all of the different kinds of addressing 

modes for a given architecture, the total number of possible opcodes is likely 
to number somewhere in the thousands. Pairing all of the thousands of 
possible first instructions with ail the possible second instructions for a given 
Instruction set could easily result in minions of different combinations. 
25 Designing a machine which Is capable of executing all of these various 
combinations is a formidable task. It is appreciated that the design 
complexity can be so great that such a problem becomes unmanageable. 
Building several decoders which could decode the complete Instruction set 
In a parallel machine which could execute/Instruction pairs without large 
30 time delays Is problematic. * ^ 

Another associated problem with building a computer capable of 
paraMel execution of Instructions is that it must also be able to run software 
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whlch is designed for prior art machines; that Is, machines which operate by 
sequential execution of Instructions-one instruction per clock cycle. In other 
words, a parallel machine must give the appearance of sequential 
operation. 

« » 

5 As will be seen, the present Invention discloses a computer System 

capable of executing two instructions In a single dock cycle. The Invention 
operates by decoding a pair of Instructions selected from a given instruction 
set and then executes them In parallel to get a correct result. One of the 
salient features of the present Invention Is that the computer system only 
1 0 Issues two instructions in parallel If there are no register dependencies 
between the paired instructions. 
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SUMMARY OF THE INVEMTiniJ 

A computer system capable of executing two Instructions In parallel In 
a single clock cycle is disclosed. The computer system comprises a dual 
instruction decoder which only Issues two instructions in parallel if their are 
no register dependencies between the Instructions, and both Instructions fall 
within the computers Instruction set 

In one embodiment, the present invention includes first and feecond 
instruction pipeline means for execution of computer Instructions. The first 
pipeline means Is operable to execute any instruction issued from the full 
instruction set, while the second pipeline means Is restricted to executing a 
predetermined subset of instructions. The subset is selected based on what 
are commonly executed Instructions. 

• A register dependency checking means Is included for Identifying the 
destination register of the first instruction In a sequence of instructions. The 
dependency checking means also determines whether the destination 
register Is used during the execution of the second Instruction of the 
sequence. If not, the dependency checking means indicates that a first 
condition is met Also Included Is a means for determining whether the first 
and second Instructions in the sequence are within the predetermined 
subset. When both Instructions are within the subset, the determining 
means indicates that a second condition Is met Whenever both the first and 
second conditions are satisfied, the pair of Instructions can be Issued in 
parallel. 

Another feature of the invention is that the computer system defaults 
to issuing only the first instruction in the sequence whenever either of the 
first of 'second conditions are not met That is, if the first and second 
instructions have a register dependency, or one of the Instructions is not 
taken from the predetermined subset, then the machine defaults to a 
condition wherein only the first Instruction is executed in a single clock cycle. 
For this condition, the second instruction in the sequence is issued during 
the next clock cycle. 



. ... 

RRIEF DESCRIPTION OF THg DRAWftWiGS 

the present invention will be understood more fully from the detailed 
description gjven below and from the accompanying drawings of the 
preferred embodiment of the Invention, which, however should not be taken 
to limit the Invention to the specific embodiment, but are for explanation and 
understanding only. 

•- •■ . ••.-••V' ■; •• . ' •' " ' '■■ i, .. 

Figure 1 Illustrates the central processing unit, pipeline execution 
staicture of a prior art microprocessor. 

. Figure 2 illustrates the CPU pipeline execution structure of the 
present invention. 

Figure 3 is a conceptual block diagram of the dual instruction 
decoder apparatus incorporated within the present invention. 
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DETAILED DESCRIPTION 

A microprocessor with an apparatus for executing two instructions in 
parallel during a single clock cycle is disclosed. In the following description, 
numerous specific details are set forth, such as specific instruction types, 
microprocessor types, etc., In order to provide a thorough understanding of the- 
preferred embodiment of the present Invention. It will be obvious, however, to 
one skilled in the art that the present invention may be practiced without these 
specific details. In other instances, well-known circuits, structures and 
methods have not been shown in detail in order to avoid unnecessarily 
obscuring the present invention. 

The invention covers an apparatus and method for executing multiple 
computer instructions in parallel In a single clock cycle. Preferably, the 
invention is embodied in a microprocessor known as the 1586™ 
microprocessor which is manufactured by Intel Corporation. The i586™ 
microprocessor is an Improved version Intel's i486™ microprocessor. Details 
of the architecture of the i486™ microprocessor are described In numerous 
publications. (Intel, i486™ and 1586™ are trademarks of Intel Corporation.) 
Although frequent reference will be made to the 1586™ architecture in the 
specification, and examples provided from the 

family of Instructions, it is appreciated that the present invention is not limited 
to these specific machines. 

PIPELINING 

Pipelining is an implementation technique whereby multiple 
instructions are simultaneously overlapped in execution. Pipelining is a 
widely used prior art technique for improving the efficiency and execution 
speed of the central processing unit (CPU). In a pipeline structure, 
instructions enter at one end-are processed through the*stages or pipe 
segments-and exit at the other end. Each of the stages of the pipeline 
completes a part of the instruction. 



With reference to Figure 1, a prior art pipeline structure is illustrated in 
which the stages of the instruction are denoted by the entries along the 
left-hand column. The clock time intervals between instruction steps are 
illustrated by the horizontal numbers. Each step in the pipeline is referred to 
5 as a clock cycle or machine cycle. 

The first stage of the pipeline is the 'PR stage, which denotes the 
prefetch portion of the pipeline. In this stage, instructions are prefetched from 
an on-chip cache memory. The next pipe stage is denoted *D1\ In this 
pipeline stage instructions are decoded and issued. The D2 stage is an 

10 address computation stage. Note that in accordance with pipeline principles, 
while the first instruction (e.g., 11) is being executed in the D1 stage of the 
second clock cycle, a second instruction (e.g., 12) begins executing its 
prefetched stage. The "EX" stage of the pipeline indicates the execution of the 
instruction In hardware, while the "WB" stage denotes a writeback operation. 

1 5 Note that In the prior art structure of Figure 1 , only a single instruction is 
executed In the pipeline for any given clock cycle. 

As discussed earlier, the present invention represents a superscaler 
machine, which is capable of executing two instructions in parallel during one 
clock cycle. To accomplish this purpose, the present Invention contains two 

20 integer pipelines, each of which is capable of executing instructions in a 
single clock. Thus, the CPU can issue two instructions in parallel in two 
separate pipelines. In the currently preferred embodiment, the pipelines are 
referred to as the "u" and w v" pipes. The u-p\pe can preferably execute any 
instruction In the x86 architecture. The v-pipe can execute certain simple 

25 instructions, as defined further in a later section of the specification. 

With reference now to Figure 2, the pipeline structure of the present 
Invention Is Illustrated. Note that in the pipeline sequence of Figure 2, two 
instructions, 11 and 12, are shown being executed at each stage of the 
pipeline In a single clock cycle. Once again, the first stage of the pipeline is 

30 the prefetch stage during which time instructions are prefetched from the on- 
chip cache. Because the presently invented microprocessor has separate 



caches for Instructions and data, prefetches no longer conflict with data 
references for access to the cache, as was the case for prior art CPUs. This 
means that during the prefetch stage the instructions 1 1 and 12 are fetched 
directly from the instruction cache and loaded into the u and vplpes. In the 
next pipe stage (i.e.. D1) the instructions 11 and 12 are decoded and issued. 

INSTRUCTION ISSUANCE AND PIPELINE SEQUENCING 
As stated earlier, the invented microprocessor can issue one or two 
instructions in a single clock cycle. In order to issue two instructions 
simultaneously, however, both instructions in the pair must satisfy certain 
conditions. That is, both instructions in the pair must be within a predefined 
subset of instructions and tree of interdependencies. (This aspect of the 
Invention will be discussed in more detail shortly). 

The process of issuing two instructions in parallel is referred to as 
"instruction pairing". When Instructions are paired, the Instruction Issued to 
the v-plpe (second pipe) is always the next sequential Instruction after the one 
issued to the t/-pipe. Although the instructions may execute in parallel, the 
behavior as seen by the programmer is exactly the same as If they were 
executed sequentially (as would be the case for prior art designs). 
Instructions proceed In parallel through the D2 and EX stages to their 
completion In the WB stage. During their progression through the pipeline, it 
is appreciated that Instructions may be stalled for any number of reasons. 
When an instruction in the u-pipe is delayed, for example, then the instruction 
(if any) issued with it to the i^pipe is aiso delayed at the same pipeline stage. 
No successive instructions are allowed to advance to the stalled stage of 
either pipeline. When an Instruction In the v-pipe Is stalled, the instruction 
issued with it in the t/-pipe is allowed to advance, while the v-pipe is stalled. 



INSTRUCTION PAIRING 

The basic idea of the present invention is that the computer system 
includes a decoding apparatus which only issues two instructions in parallel if 
there ate no register dependencies between them, and if both instructions 
belong to a subset of instructions eligible for parallel execution. The dual 
instruction decoder first identifies the destination register for the first instruction 
in the program sequence. This instruction becomes thfc jy-pipe instruction. 
The invented apparatus then determines whether the i/-pipe instruction is 
used in any way during the execution of the second instruction in the 
sequence. If it is not (i.e., the two instructions are independent), then both 
instructions are issued in parallel. 

As discussed earlier, the superscaler machine of the present invention 
Includes two parallel pipes-called the uand v-pipes--which exploits 
parallelism within the complete instruction set. the instruction unit of the 
microprocessor always issues the first instruction in the instruction sequence 
to the u-pipe, and the second instruction to the v-pipe. The v-pipe stalls 
whenever the u-pipe operand is not accessible or there is an address collision 
between the pipes. Pairing can only occur between two integer instructions, 
or two floating point instructions. 

, In general, pairs of simple instructions may be paired as long as there 
are no dependencies between them. To issue two integer instructions 
simultaneously the following conditions have to be satisfied according to the 
currently preferred embodiment. First, the instruction must belong to a 
predetermined subset of the x86 instruction set. The Instruction subset for 
pairing of integer instructions is shown below in Table 1. 
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Table 1 





Intecj 


er Instruction Subset For Pairing 




u- 


pipe Instruct 


ons 


V- 


pipe Instructions 


mov r, r 


alu r, 1 


ntiQh r 

puoi 1 1 


mnw r r 
HIUV | t f 


alu r, i 


push r 


mov r, m 


alu m f 1 


push 1 


mov r t m 


alu m, i f 


push i 


mov m, r 


alu eax, i 


pop r 


mov m, r 


alu eax, 1 


pop r 


mov r, I 


alu m, r 


nop 


mov r, i 


alu m, r 


jmp near 


mov m, h 


alu r, m 




mov m, i 


alu r f m 


jcc near 


mov eax, m 


inc/dec r 




mov eax, m 


inc/dec r 


OFjcc 


mov m, eax 


inc/dec m 




mov m t eax 


inc/dec m 


call near 


alu r, r 1 


lea r, m 




alu r, r 


lea r, m 


nop 



5 (Note that In Table 1 , the entry "alu r,r denotes a class of Instructions 

that comprise such Instructions as "add," "or," "adc," "sob," "and," xor," "cmp".) 

The Idea of subsettlng Is an Important concept In the present invention 
since It radically reduces the number possible combinations of different 
instructions that would have to be handled during parallel execution. The 
1 0 system recognizes that there are a small subset of instructions (approximately 
twenty) which account for nearly 95% of all the instructions executed by typical 
software. A collection of the most frequently used instructions are listed in 
Table 1 above. The use of subsettlng means that the dual Instruction decoder 
need not operate on the complete instruction set. Rather, its design can be 
1 5 simplified to greatly improve the timing relationships involved. Use of 

subsettirig also allows the machine to quickly identify the two Instructions, 
decode them rapidly, and then execute them In parallel. 

The next Important restriction on parallel execution of instructions is that 
there be no register dependencies between the paired instructions. This 
20 means that the destination register of the first instruction cannot be used as 
the source, destination, base, or index of the next instruction. This 
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requirement holds for explicit and implicit use of register for the instructions. 
(Note that the exception to this is the pairing of "push" and -pop" instructions 
together, for which, in the currently preferred embodiment there exists special 
hard\^re in the segmentation unit which updates the stack pointer.) For 
dependency checking, using any part of the 16/32-bit register is the same as 
using the entire register. If there is a memory dependency between the u and 
v-pipe Instructions, i.e., both uand v-pipe memory accedes are to the same 
bank/address of the data cache. The v-pipe cycle will be balanced until the u- 
pipe access has finished. 

Other restrictions which are peculiar to the current implementation of 
the 1586™ microprocessor include: 

• The y-pipe instruction cannot have a prefix, except OF Jcc. 

• The end bit markers in the code cache corresponding to the first 
instruction must be properly set. 

• There are enough opcode bytes in the prefetch buffers to decode 
both instructions. 

• An instruction in the u or v-plpe can either have a displacement or 
an immediate but not both. 

• There cannot be ADC and SBB instructions in the v-pipe (to avoid 
dependency on the i/-plpe carry flag.) 

With reference now to Figure 3 there is shown a block diagram of a 
dual instruction decoder illustrating the broad concept of the present 
invention. In Figure 3 the u-pipe opcode and lApipe opcode are coupled to 
decoders 12 and 14, respectively. In addition to being coupled to decoder 12, 
the its-pipe opcode Is also coupled to an additional decoder 13. Decoders 12, 
13 and 14 comprise ordinary programmable logic arrays (PLAs) which do all 
the decoding of the instructions. For example, decoder 12 generates the first 
vector of microcode for the u-pipe instruction, while decoder 14 has a similar 
PLA that decodes the first vector of microcode for the v-pipe instruction. Each 
of the microcode vectors comprise fields which contain informaiicn such as 
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the source register location, the destination register, ALU operation 

information, address computation, and displacement Immediate. 

Of course, the central feature of the invention is that the dual decoder 
Illustrated in Figure 3 is capable of issuing either one or two instructions in a 
single cycle. Since one object of the present invention is to be able to work on 
all of the x86 family of instructions, the decoder of Figure 3 is divided into two 
sections or paths. One path is capable of handling all instructions in the x86 
instruction set, while the other path is aimed specifically at hjindling a second 
instruction in parallel. In other words, the present invention Includes a mode 
of operation whereby one Instruction is executed per clock cycle if the 
conditions for superscaler operation Is not met. 

With continuing reference to Figure 3, the two pipes are very similar 
except for the fact that the u-pipe functions as the default pipeline when 
superscaler conditions are not met. This means that the u-pipe path in Figure 
3 is capable of executing all x86 instructions, whereas the v-pipe is aimed 
only at a subset of the full Instruction set. For example, decoders 13 and 14 
are specifically designed to decode only a subset of the full x86 instruction set. 
On the other hand, decoder 12 is capable of decoding the full instruction set 
when the machine defaults to one instruction per clock cycle. In any 
conceptual sequence, the t>pipe always represents the first Instruction in the 
sequence and the v-p\ P 9 always represents the second instruction in the 
same sequence. 

Register dependency checking Is performed by unit 19 which receives 
outputs from decoders 13 and 14. The outputs of decoders 13 and 14 (which 
are coup'fled to unit 19) Include information which indicates the destination 
register of the current Instruction. Ordinary logic In unit 19 determines whether 
a dependency exists in the destination register for each instruction by 
Identifying the destination register of the u-pipe and Insurinf that it is not used 
in the v-pipe instruction. At the same time that the register dependency check 
is being performed, there is also a length calculation which is performed by 
unit 1 7. In other words, unit 1 7 calculates the length ol the pair of instructions. 
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i.e. the sum of the u-pipe plus the v-pipe instructions. Unit 15 only calculates 
the length of the (/-pipe instruction. 

Conceptually, the outputs of units 15 and 17 are coupled to a 
multiplexor 21 which outputs the instruction's length. Multiplexor 21 is 
controlled by a signal ISELTWO which provides basic "issue one/issue two- 
information output from register dependency check unit 19. The signal 
ISELTWO is the same signal that is used to conditionally execute the v-pipe 
instruction by controlling MUX 22. When register dependency check unit 19 
determines that only one instruction can be executed. MUX 21 is controlled 
such that the instruction length is whatever the length of the u-pipe vector is. 
In the **pipe, when only one instruction is issued, the control signal ISELTWO 
selects no operation ("NOP") to be output by multiplexor 22. For such a 
situation the length comes solely from the u-pipe. 

When there is no register dependency, two instructions can be 
executed in parallel. For this condition, the instruction length output by MUX 
21 Is selected to be the same as the length calculation of the u and v-pipes 
together (i.e. the output of unit 17). For this condition, the machine essentially 
sees the pair as one large instruction. When two instructions are executed in 
parallel, MUX 22 simply passes the u-pipe microcode vector through to the 
output of MUX 22. The information at the output of MUXs 21 and 22 is 
coupled to the execution engine of the microprocessor. The execution engine 
normally comprises the address computation unit, the arithmetic logic unit 
(ALU), data paths, register files, etc. 

It should be emphasized that all the operations performed by the dual 
instruction decoder of Figure 3 are done within one clock cycle. That is, the 
opcodes are coupled to the inputs of the upper decoders and the vectors are 
provided by the multiplexing units all within a single clock cycle. 

PAIRING EXAMPLES 
To better understand and appreciate the present invention consider 
some of the following examples of sequences of instructions. These 
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10 



15 



20 



25 



30 



sequences also Illustrate the Important pairing rules discussed earlier. (Note 
that In the format provided, the destination operand is on the left.) 
Consider the following pair of simple Instructions. 

mov edx, [ebxj 
add esi, 4; 

In this example, the first instruction in the sequence is a "mov" which 
will be handled by the u-plpe. The destination register for the o-pipe 
instruction is edx. Since edx is not used in any manner in the v-pipe "add" 
instruction, and furthermore, since both instructions are within the instruction 
subset (see Table 1) the decoder of Figure 3 can issue both In parallel. 

During execution, the first instruction opcode (i.e., "mov") Is the u-pipe 
opcode, wherein "add" is the v-pipe opcode. Upper decoder! 2 decodes the 
"mov- instruction and produces a u-plpe vector which specifies that the 
destination register is edx. The decoder also specifies that a memory read is 
necessary-wherein the address is specified by ebx. Decoder 12 also 
Identifies the components of the address calculation and determines that it is a 
one vector macro instruction. At the same time, the subset decoder 1 3 looks 
to see whether "mov" is within the subset of the instructions that are eligible for 
dual issuance. Decoder 13 also Identifies the destination register edx which 
will be used by register dependency checking unit 1 9. Unit 1 9 will check edx 
versus esi and conclude that they are not the same. 

On the v-pipeline side, the v-pipe decoder 14 looks at add esi and 
identifies esi as the destination register (for this example, esi is also the 
source). Decoder 14 also identifies the immediate component (i.e., 4), and 
then unit' 17 computes the length of the two instructions. The "add" instruction 
then proceeds down the v-pipe. Thus, because both instructions are simple 
and there are no dependencies between them, these two instructions may be 
paired. 

Next, consider the following pairing example. 

mov edx [ebxj; 
add edx, 4; 



Iri this example the destination register for the (/-pipe instruction is edx. 
Since this destination register is also used in the v-pipe instruction, the 
dependency checking logic determines that both instructions cannot be 
issued in parallel. For this case, the (/-pipe instruction is issued first, while the 
5 v-pipe path remains dormant (i.e., v-pipe issues a "NOP-). In the following 
clock cycle, the "add" instruction is executed in the (/-pipe. It should be 
understood that the add instruction issued in the (/-pipe during the next clock 
cycle may possibly be issued in parallel with whatever instruction follows it in 
the sequence. In the event that both instructions are issued in parallel the 
10 next instruction in the sequence (following the "add" instruction) will be issued 
in the v-pipe. 

Now consider the following example. 

Ids [ebx]; 
push eax; 

1 5 In this example the load instruction "Ids" is not included in the subset of 

instructions eligible for parallel execution (see Table 1). Consequently, the 
Ids instruction gets issued in the (/-pipe and in the following clock cycle the 
"push" instruction gets Issued In the (/-pipe. In this situation, the (/-pipe upper 
decoder 13 identifies the Ids instruction as not being in the eligible subset. 

20 This is the case even though there is no dependency between the two 

instructions. It is important to note that when two instructions cannot be issued 
in parallel, the v-pipe opcode becomes the (/-pipe opcode for the next clock 
cycle. The next instruction In the sequence then becomes the v-pipe opcode. 
It should be understood that the particular list of instructions in the 

25 subteet of x86 instructions in the currently preferred embodiment may vary in 
different alternative embodiments. At the same time, various embodiments 
may permit pairing of certain instructions for which there is an implicit 
. dependency if there exists special hardware to allow both instructions to be 
issued and be executed In parallel. By way of example, the currently 

30 preferred embodiment includes special hardware which allows it to execute 
the following instructions in parallel: 
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cmp edx, 0; 

jnz loop 

For the above example, although there is an implicit dependency on 
the z flag, the microprocessor includes special hardware to allow these 
5 instructions to be issued and executed in parallel. 

For pairing two floating point instructions, the last three conditions listed 
for integer instruction pairing along with the memory dependency checking do 
not apply in the current embodiment. The reason for this is that floating point 
instructions do not have immediate bytes. Also, register dependency is 
0 allowed between the u-pipe instructions and the FXCH instruction in the v- 
pipe. Since FXCH is a register- register instruction, memory dependency does 
not apply. The subset of floating point instructions that can be paired in either 
pipe Is, listed below in Table 2 for the currently preferred embodiment. All of 
them are one vector instructions. 

5 

Table 2 



.Floatina Point Instruction Subset For Pairina \ 


u-pipe instructions 


v-pipe instructions 


fsub 




fsqrt 


fid 


fxch 


, fadd 




fdiv 


fchs 




.fmul 




fnop 


ftst 




fcom 




fucom 







Whereas many alternations and modifications to the present invention 
will no doubt become apparent to the person of ordinary skill in the art after 
having read the forgoing description, it is to be understood that the particular 
embodiments shown and described by way of illustration are in no way 
intended to be considered limiting. For example, although this disclosure has 
shown a particular set of conditions and rules to be satisfied, other conditions 
may also be relied upon without detracting from the spirit or scope of the 
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present invention. Therefore, reference to the details of the illustrated 
diagrams is not intended to limit the scope of the claims which themselves 
recite only those features regarded as essential to the invention. 



J 
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CLA1MS 



1 . A computer system capable of executing two computer 
instructions in parallel comprising: 

first and second Instruction pipeline means for executing of computer 
instructions, said first pipeline means being operable to execute any 
instruction issued from a predetermined instruction set, said second pipeline 
only being operable to execute a subset of instructions from said 
predetermined instruction set; 

dependency checking means for identifying the destination register of 
the first Instruction in a sequence of instructions, said checking means also 
determining whether said destination register is used during the execution of 
the second Instruction in said sequence, if not. said dependency checking 
means Indicating that a first condition Is met; 

means for determining whether said first and second Instructions in 
said sequence are within said subset, when both of said first and second 
instructions are within said subset, said determining means Indicating that a 
second condition has been met; 

instruction pairing means for issuing said first and second instructions 
In parallel to said first and second pipeline means whenever said first and 
second conditions are satisfied. 

2. The computer system of Claim 1 wherein said first and second 
instructions. are Issued in a single clock cycle whenever said first and 
second conditions are satisfied. 



3. The computer system of Claim 1 wherein saidftnstruction 
pairing means only issues said first instruction in said sequence to said first 
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plpeline means whenever either of said first or said second conditions are 
not met. 

^4. The computer system of Claim 3 further comprising a means for 
calculating the combined length of said first and second instructions. 

5. The computer system of Claim 4 wherein said first and second 

• t ! 

instructions each comprise integer Instructions. 

6. The computer system of Claim 4 wherein said first and second 
Instructions each comprise flowing point instructions. 

7. A computer system including a CPU for running a program 
consisting of a sequence of instructions selected from an Instruction set, and 
an execution engine for executing said instructions, said system comprising: 

first and second instruction pipeline means for executing said 
sequence of instructions wherein multiple Instructions are simultaneously 
overlapped In execution; 

decoder means for decoding a pair of instructions from said sequence 

to produce first and second microcode vectors for execution in said first and 

r 

second instruction pipelines, respectively; 

said decoder means including a means for determining whether said 
first and second instructions are Included within a predetermined subset of 
said instruction set; 

calculating means for calculating the length of said first instruction 
and the combined length of said firs and second instructions; 

means for determining that there is no register dependency between 
said pair of instructions, said determining means producing a signal 
whenever a dependency exists; 

multiplexing means for issuing said first and second microcode 
vectors and said combined length to said execution engine whenever S2id 



-19- 

signal is received and said first and second instructions are included within 
said predetermined subset, said multiplexing means issuing said first 
microcode instruction in said length otherwise. 

8. The system of Claim 7 wherein said first and second 

Instructions are executed within one clock cycle of said system whenever 

said signal is received by said multiplexing means and said first and second 

r 

instructions are included within said predetermined subset. 

9. * The system of Claim 8 wherein said first and second 
instructions comprise integer instructions. 

10. The system of Claim 8 wherein said first and second 
instructions comprise floating point instructions. 
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